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DETAILED ACTION 



Claim Rejections - 35 USC § 103 
The following is a quotation of 35 U.S.C. 103(a) which forms the basis for all 
obviousness rejections set forth in this Office action: 

(a) A patent may not be obtained though the invention is not identically disclosed or described as set forth in 
section 102 of this title, if the differences between the subject matter sought to be patented and the prior art are 
such that the subject matter as a whole would have been obvious at the time the invention was made to a person 
having ordinary skill in the art to which said subject matter pertains. Patentability shall not be negatived by the 
manner in which the invention was made. 

1. Claims 1-6, 10-16, and 35-36 are rejected under 35 U.S.C. 103(a) as being unpatentable 
over Mitchell (US Patent No. 5,799,273) in view of Ball et al (US Patent No. 6,240,391). 

2. Regarding claims 1 and 35, Mitchell discloses a system for relating words in an audio file 
to words in a text file, comprising: retrieving a text file comprising a plurality of textual words 
(col. 6, lines 20-29); generating an audio file comprising a plurality of audible words based on 
the text file (col. 6, lines 9-19); storing information relating each audible word to a 
corresponding textual word (col. 6, lines 48-65); and an electronic marker that indicates the 
position of the audible word within the text file (abstract; col. 9, lines 13-25 in which Mitchell 
discloses the user can delete and/or insert text and the recognition interface updates the links 
between the recognized word and the associated audio components such that link data is 
amended to indicate the correct character position of the word in the text). 

Mitchell does not disclose that the electronic marker is within the audio file, however 
storing markers that link portions of audio and portions of text with the audio file was well 
known in the art and it would have been obvious to one of ordinary skill at the time of the 
invention to provide for the electronic marker embedded in the audio file that indicates the 
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position of the audible word within the text file so as to aid the user in reviewing the text as the 
audio is output. 

Mitchell does not teach the audio file is generated by converting the textual words to a 
plurality of audible words, or that the audio file is transmitted or available to a user of a 
telecommunications device. Ball (col. 4, line 66 to col, 5, line 3; col. 6, lines 23-47; col. 6, lines 
51- col. 7, line 6) discloses a system and method for assembling and presenting structured voice 
mail messages, which implements a telephone/IP server for providing the functions of audio 
play and record, text-to-speech synthesis, dual-tone multi-fi'equency (DTMF) (touch-tone) 
recognition, automatic speech recognition (ASR) processing, and other call control functions 
necessary for interactive audio services, and specifically teaches the textual messaging elements 
of the structured voice mail message are converted to a speech signal by a text-to-speech 
processor, and combined with each other and audio fragments, converted from their data files. 

It would have been obvious to one of ordinary skill at the time of the invention to modify 
the system Mitchell to provide the linked audio and text data for transmission to a user of a 
telecommimications device, so as to generate structured voice mail messages so as to provide 
the voice mail recipient the ability to access any audio, text, video or multi-media type message. 

Regarding claim 2, Mitchell discloses the textual words comprise ASCII text (col. 5, lines 

59-67). 

Regarding claim 3, Mitchell discloses the audio file is stored in the form of a WAV file 
(col. 6, lines 9-29; col. 13, lines 26-30). 

Regarding claim 4, Mitchell discloses the information comprises voice tags embedded in 
the audio file (col. 7, lines 1-30). 
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Regarding claim 5, Mitchell discloses the information comprises a file map relating a 
location of each textual word within the text file to a location of the corresponding audible word 
in the audio file (col. 6, line 48 to col 8, line 3). 

Regarding claims 6 and 36, Mitchell discloses the method steps are performed by login 
embodied in a computer readable medium (col. 4, line 66 to col. 5, line 36). 

Regarding claim 15, Mitchell discloses a system for relating words in an audio file to 
words a text file, comprising: retrieving a text file comprising a textual word (col. 6, lines 20- 
29); generating an audible word corresponding the textual word (col. 6, lines 9-19); storing the 
audible word in an audio (col. 6, lines 9-29; col. 13, lines 26-30); storing a file map, the file map 
comprising: a first location locating audible word within the audio file (Figures 3-4; col. 6, line 
48 to col. 7, line 30); and a second location locating the textual word within the text file (Figures 
3-4; col. 6, line 48 to col. 7, line 30). 

Mitchell does not teach the audio file is generated by converting the textual words to a 
plurality of audible words, or that the audio file is transmitted or available to a user of a 
telecommunications device. Ball (col. 4, line 66 to col. 5, line 3; col. 6, lines 23-47; col. 6, lines 
51- col. 7, line 6) discloses a system and method for assembling and presenting structured voice 
mail messages, which implements a telephone/IP server for providing the ftinctions of audio 
play and record, text-to-speech synthesis, dual-tone multi-frequency (DTMF) (touch-tone) 
recognition, automatic speech recognition (ASR) processing, and other call control functions 
necessary for interactive audio services, and specifically teaches the textual messaging elements 
of the structured voice mail message are converted to a speech signal by a text-to-speech 
processor, and combined with each other and audio fi-agments, converted from their data files. 
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It would have been obvious to one of ordinary skill at the time of the invention to modify 
the system Mitchell to provide the linked audio and text data for transmission to a user of a 
telecommunications device, so as to generate structured voice mail messages so as to provide the 
voice mail recipient the ability to access any audio, text, video or multi-media type message. 

Regarding claim 16, Mitchell discloses repeating the steps the method plurality of textual 
words in the text file (col. 5, line 59 to col. 8, line 3; Figures 3-4). 

Regarding claim 10, Mitchell discloses a system for relating words in an audio file to 
words in a text file, comprising: retrieving a text file comprising a plurality of textual words (col. 
6, lines 20-29); generating an audible word corresponding to each textual word (col. 6, lines 9- 
29); and playing the audible words to a user in real time as the audible words are generated (col. 
8, line 52 to 10, line 2); and during the playing of the audible words, determining a current 
textual word corresponding audible word currently being played (col. 8, line 52 to col. 10, line 
2). 

Mitchell does not teach the audio file is generated by converting the textual words to a 
plurality of audible words with each audible word comprising media stream packets, or that the 
audio file is transmitted or available to a user of a telecommunications device. Ball (col. 4, line 
66 to col. 5, line 3; col. 6, lines 23-47; col. 6, lines 51- col. 7, line 6) discloses a system and 
method for assembling and presenting structured voice mail messages, which implements a 
telephone/IP server (and thus providing for the transmission of data that comprises media 
stream packets) for providing the fimctions of audio play and record, text-to-speech synthesis, 
dual-tone multi-frequency (DTMF) (touch-tone) recognition, automatic speech recognition 
(ASR) processing, and other call control fimctions necessary for interactive audio services, and 



Application/Control Number: 1 0/020, 1 02 Page 6 

Art Unit: 2626 

specifically teaches the textual messaging elements of the structured voice mail message are 
converted to a speech signal by a text-to-speech processor, and combined with each other and 
audio fragments, converted from their data files. 

It would have been obvious to one of ordinary skill at the time of the invention to modify 
the system Mitchell to provide the linked audio media stream packets and text data for 
transmission to a user of a telecommunications device, so as to generate structured voice mail 
messages so as to provide the voice mail recipient the ability to access any audio, text, video or 
multi-media type message that has been stored. 

Regarding claim 11, Mitchell discloses the textual words comprise ASCII text (col. 5, 
lines 59-67). 

Regarding claim 12, Mitchell discloses initializing a counter identifying textual words 
within the text file (col. 6, line 48 to col. 7, line 30); and incrementing the counter after each 
audible word is played (col. 6, line 48 to col. 7, line 30); wherein the step of determining 
comprises identifying the current textual word using the counter (col. 6, line 48 to col. 7, line 
30). 

Regarding claim 13, Mitchell discloses storing information about the audible word, the 
information comprising: an identifier for the textual word corresponding the audible word (col. 
6, line 48 to col. 8, line 3); and a time at which the audible word was played (col. 6, line 48 to 
col. 8, line 3; Figures 3-4). 

Regarding claim 14, Mitchell discloses the method steps are performed by login 
embodied in a computer readable medium (col. 4, line 66 to col. 5, line 36). 
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5. Claims 7-8, 17, 30, and 32-33, are rejected under 35 U.S.C. 103(a) as being unpatentable 
over Mitchell and Ball in view of Dionne (US Patent No. 6,068,487) and further in view of 
FruUa et al (US Patent No. 6,424,357). 

6. Regarding claims 7, 17, 30 and 32, Mitchell discloses a system for relating words in an 
audio file to words a text file, comprising: retrieving a text file comprising a textual word (col. 6, 
lines 20-29); generating an audible word corresponding the textual word (col. 6, lines 9-19); 
storing the audible word in an audio (col. 6, lines 9-29; col. 13, lines 26-30); storing a file map, 
the file map comprising: a first location locating audible word within the audio file (Figures 3-4; 
col. 6, line 48 to col. 7, line 30); and a second location locating the textual word within the text 
file (Figures 3-4; col. 6, line 48 to col. 7, line 30). 

Mitchell does not teach the audio file is generated by converting the textual words to a 
plurality of audible words, or that the audio file is transmitted or available to a user of a 
telecommunications device. Ball (col. 4, line 66 to col. 5, line 3; col. 6, lines 23-47; col. 6, lines 
51- col. 7, line 6) discloses a system and method for assembling and presenting structured voice 
mail messages, which implements a telephone/IP server for providing the functions of audio 
play and record, text-to-speech synthesis, dual-tone multi-firequency (DTMF) (touch-tone) 
recognition, automatic speech recognition (ASR) processing, and other call control functions 
necessary for interactive audio services, and specifically teaches the textual messaging elements 
of the structured voice mail message are converted to a speech signal by a text-to-speech 
processor, and combined with each other and audio fragments, converted from their data files. 
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It would have been obvious to one of ordinary skill at the time of the invention to modify the 
system Mitchell to provide the linked audio and text data for transmission to a user of a 
telecommunications device, so as to generate structured voice mail messages so as to provide the 
voice mail recipient the ability to access any audio, text, video or multi-media type message. 

Mitchell does not teach that the system identifies an audible word to be spelled in 
response to the command to spell; identifies a textual word in a text file corresponding to the 
audible word to be spelled; and audibly spell the textual word. Dionne teaches a method for 
having a reading machine spell a word, which includes retrieving a word to be spelled, 
displaying letters of the word, spelling the word and provide an text-to-speech output of the word 
(col. 3, lines 8-34). Dionne teaches that the system is useful in assisting individuals with 
leaming disabilities or severe visual impairments. 

It would have been obvious to one of ordinary skill at the time of the invention to modify 
the system of Mitchell to provide the spelling of words in the text, to aid in the editing of 
recognized text and in the correcting of recognition errors, for the purpose of assisting 
individuals with visual impairments with editing of text. 

Mitchell and Dionne do not teach the command input to the system is via a voice 
command. However, implementation of voice commands to allow for system functionality and 
control similar to that of hand-controlled input devices was well known in the art. 

Frulla discloses a voice input system has a microphone coupled to a computing device, 
with the computing device typically operating a computer software application. A user speaks 
voice commands into the microphone, with the computing device operating a voice command 
module that interprets the voice command and causes the graphical or non-graphical application 
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to be commanded and controlled consistent with the use of a physical mouse (Figures 2-3; col. 

4, lines 57-64), and specifically teaches the system is advantageous in environments in which it 

is inconvenient or impractical to use a mouse, and thereby making the user interface more 

convenient and efficient for a user to input information and commands. 

It would have been obvious to one of ordinary skill at the time of the invention to modify 
the system of Mitchell to provide the spelling of words in the text, to aid in the editing of 
recognized text and in the correcting of recognition errors and to further provide voice command 
control, as suggested by Frulla, for the purpose of making the user interface more convenient and 
efficient for a user to input information and commands in situations in which using a physical 
mouse is impractical or cumbersome. 

Regarding claims 8 and 33, Mitchell discloses repeating the steps the method plurality of 
textual words in the text file (coL 5, line 59 to col. 8, line 3; Figures 3-4). 

7. Claims 18-29 and 37 are rejected under 35 U.S.C. 103(a) as being unpatentable over 
Mitchell in view of Dionne (US Patent No. 6,068,487) and further in view of Frulla et al (US 
Patent No. 6,424,357). 

8. Regarding claims 1 8-29, and 37, Mitchell discloses a system which provides a user 
interface for relating words in an audio file to words a text file, comprising: retrieving a text file 
comprising a textual word (col. 6, lines 20-29); generating an audible word corresponding the 
textual word (col. 6, lines 9-19); storing the audible word in an audio (col. 6, lines 9-29; col. 13, 
lines 26-30); storing a file map, the file map comprising: a first location locating audible word 
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within the audio file (Figures 3-4; coL 6, line 48 to col. 7, line 30); and a second location locating 
the textual word within the text file (Figures 3-4; col. 6, line 48 to col. 7, line 30). 

Mitchell does not teach that the system identifies an audible word to be spelled in 
response to the command to spell; identifies a textual word in a text file corresponding to the 
audible word to be spelled; and audibly spell the textual word. Dionne teaches a method for 
having a reading machine spell a word, which includes retrieving a word to be spelled, 
displaying letters of the word, spelling the word and provide an text-to-speech output of the word 
(col. 3, lines 8-34). Dionne teaches that the system is useful in assisting individuals with 
learning disabilities or severe visual impairments. 

It would have been obvious to one of ordinary skill at the time of the invention to modify 
the system of Mitchell to provide the spelling of words in the text, to aid in the editing of 
recognized text and in the correcting of recognition errors, for the purpose of assisting 
individuals with visual impairments with editing of text. 

Mitchell and Dionne do not teach the command input to the system is via a voice 
command. However, implementation of voice commands to allow for system fiinctionality and 
control similar to that of hand-controlled input devices was well known in the art. 

FruUa discloses a voice input system has a microphone coupled to a computing device, 
with the computing device typically operating a computer software application. A user speaks 
voice commands into the microphone, with the computing device operating a voice command 
module that interprets the voice command and causes the graphical or non-graphical application 
to be commanded and controlled consistent with the use of a physical mouse (Figures 2-3; col. 
4, lines 57-64), and specifically teaches the system is advantageous in environments in which it 
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is inconvenient or impractical to use a mouse, and thereby making the user interface more 

convenient and efficient for a user to input information and commands. 

It would have been obvious to one of ordinary skill at the time of the invention to modify 
the system of Mitchell to provide the spelling of words in the text, to aid in the editing of 
recognized text and in the correcting of recognition errors and to further provide voice command 
control, as suggested by Frulla, for the purpose of making the user interface more convenient and 
efficient for a user to input information and commands in situations in which using a physical 
mouse is impractical or cumbersome. 

Response to Arguments 
9. Applicants arguments with respect to claims 10-17 have been considered but are moot in 
view of the new ground(s) of rejection. 

Applicant's arguments with respect to claims 7-8, 17-33, and 37 have been fiiUy 
considered but they are not persuasive. Applicant argues Mitchell and Dionne do not disclose 
input of voice commands. Applicant also argues FruUa is limited to implementing simple 
computer mouse functions using voice commands and that therefore the cited references do not 
teach or suggest receiving a voice command from a user to spell an audible word. In response to 
applicant's arguments against the references individually, one cannot show nonobviousness by 
attacking references individually where the rejections are based on combinations of references. 
See In re Keller, 642 F.2d 413, 208 USPQ 871 (CCPA 1981); In re Merck <& Co., 800 F.2d 1091, 
231 USPQ 375 (Fed. Cir. 1986). 
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In response to applicant's argument that there is no suggestion to combine the references, 
the examiner recognizes that obviousness can only be established by combining or modifying the 
teachings of the prior art to produce the claimed invention where there is some teaching, 
suggestion, or motivation to do so foxmd either in the references themselves or in the knowledge 
generally available to one of ordinary skill in the art. See In re Fine, 837 F.2d 1071, 5 
USPQ2d 1596 (Fed. Cir, 1988)and In re Jones, 958 F.2d 347, 21 USPQ2d 1941 (Fed. Cir. 1992). 
In this case, Kurzweil (US Patent No. 6,199,042) teaches linking the audio portions with the 
corresponding text portions of a document (col. 5, lines 1 1-26). 

Any inquiry concerning this communication or earlier communications from the 
examiner should be directed to Angela A. Armstrong whose telephone number is 571-272-7598. 
The examiner can normally be reached on Monday-Thursday 1 1 :30-8:00 PM. 

If attempts to reach the examiner by telephone are unsuccessful, the examiner's 
supervisor, Patrick Edouard can be reached on 571-272-7603. The fax phone number for the 
organization where this application or proceeding is assigned is 571-273-8300. 
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Information regarding the status of an application may be obtained from the Patent 
Application Information Retrieval (PAIR) system. Status information for published applications 
may be obtained from either Private PAIR or Public PAIR. Status information for unpublished 
applications is available through Private PAIR only. For more information about the PAIR 
system, see http://pair-direct.uspto.gov. Should you have questions on access to the Private PAIR 
system, contact the Electronic Business Center (EBC) at 866-217-9197 (toll-free). If you would 
like assistance from a USPTO Customer Service Representative or access to the automated 
information system, call 800-786-9199 (IN USA OR CANADA) or 571-272-1000. 
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Primary Examiner 
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